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ABSTRACT 


This  document  provides  a  detailed  description  and 
analysis  of  the  recognition  algorithms  used  In  the 
Vlcens-Reddy  speech  recognition  system. 
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1.  INTRODUCTION 

This  document  provides  a  detailed  description  of  the  recognition  procedures  used 
In  the  Vlcens-Reddy  speech  recognition  system  []].  It  Is  a  sequel  to  SDC 
TM-4652/200,  Description  and  Analysis  of  the  Vlcens-Reddy  Preprocessing  and 
Segmentation  Algorithms,  to  which  the  reader  Is  referred  for  a  description  of 
the  terms  and  variables  used. 

Recognition  Is  a  method  of  assigning  linguistic  labels  to  the  sustained  and 
transitional  segments  of  the  P-matrlx.  There  are  14  such  labels  for  the 
sustained  segments  and  one  label  for  the  transitional  segments.  These  are 
given  In  Table  1. 

Table  1.  Labels  for  Transitional  and  Sustained  Segments 


Linguistic  Label 

Four-Character  Name 

Type  Number 

Transitional 

TRAN 

0 

Consonant 

CNST 

1 

Nasal 

NASL 

2 

Stop 

STOP 

3 

Burst 

BRST 

4 

Fricative 

FRIC 

5 

Vowel  type  1 

VWL1 

6 

Vowel  type  2 

VWL2 

7 

Vowel  type  3 

VWL3 

8 

Vowel  type  4 

VWL4 

9 

Vowel  type  5 

VWL5 

10 

Vowel  type  6 

VWL6 

11 

Vowel  type  7 

VWL7 

12 

Vowel  type  8 

VWL8 

13 

Vowel  type  9 

VWL9 

14 
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Note  that  moat  of  tha  conventional  linguistic  groups  of  phonemes  are  included 
In  the  table.  Other  groups,  such  as  glides,  have  been  omitted.* 

Recognition  Is  divided  Into  three  parte:  (1)  Primary  Claaslflcatlon,  (2) 
Secondary  Classification,  and  (3)  Construction  of  the  R-matrlx.  Primary 
classification  Is  a  serial  process  In  which  each  sustained  segment  ie  first 
tested  to  see  If  It  Is  a  fricative;  If  it  Is  not  classified  as  a  fricative, 
tests  are  sequentially  performed  for  the  following  groups: 

Vowel 
.  Stop 
Nasal 

.  Consonant 

The  label  "consonant"  Is  attached  to  all  those  sustained  segments  not  falling 
Into  the  other  categories.  Because  of  this  serial  process,  the  phoneme  groups 
given  In  Table  1  are  not  mutually  exclusive.  For,  If  a  sustained  segment 
satisfies  the  test  for  a  vowel  but  could  also  fulfill  the  test  for  a  nasal,  It 
would  never  be  considered  a  nasal  since  the  vowel  test  precedes  that  for  nasals. 
Secondary  classification  regroups  adjacent  fricatives  and  adjacent  stops  and 
detects  and  labels  burst  segments.  Special  tests  are  then  performed  to  define 
beginning  and  ending  segments.  Finally,  an  array  called  the  R-matrix,  or 
feature  matrix,  is  constructed. 

2.  PRIMARY  CLASSIFICATION 

Primary  classification  consists  of  five  steps  of  sequentially  determining 
(1)  fricatives,  (2)  vowels,  (3)  stops,  (4)  nasals,  and  (5)  consonants. 


It  is  felt  that  If  the  original  transitional  segmsnts  occurring  in  secondary 
segmentation  were  retained,  rather  than  being  extended  onto  surrounding 
sustained  segments,  they  might  provide  a  clue  for  the  existence  of  glides. 
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2.1  FRICATIVE  DETERMINATION 

P(l)  Is  labalad  a  fricative  (i.e.,  TYPE(l)  -  5)  if  either: 

(1)  Z3(i)  *  75 

and  Al(i)  s  20, 
or  (2)  60  s  Z3(i)  <  75, 

A3(i)  *  Al(l) , 
end  Al(i)  S  20, 
or  (3)  45  s  Z3(i)  <  60, 

Al(l)  *  12, 
and  A3(l)  *  Aid). 

In  an  attempt  to  explain  the  above  three  teate,  we  note  that  frlcatlvea  are 
generally  characterized  by  a  high  Z3  frequency  and  a  low  A1  amplitude  (a.g., 
aee  [3]  end  [4]).  Consider  now  the  following  diagram  of  the  Z3  and  A1  rangea: 

Center  Freq. 


•  _  l _  I  _  I _ i  _  i _ I 

44  45  58  60  72  75  100 


Lower  1/3  Middle  1/3  Upper  1/3 
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For  toat  (1)*  13(1)  auat  ba  la  approaiaately  cha  appat  half  of  tho  thirl  fraqe icy 
hand,  and  Al(l)  auat  bo  In  tho  lower  third  of  all  poeeible  hi  aplltilw. 

For  toat  (2),  Z3(l)  la  required  only  to  bo  In  tha  aoeoad  f north  of  all  pooolblo 
Z3  valuer.  Al(l)  auat  alao  bo  In  tha  lower  third  of  lta  range  an  before.  How¬ 
ever,  bocauoo  tho  conotralnt  on  Z3  haa  bona  lowered,  an  additloaal  condition, 
that  h3(l)  a  Al(l) »  haa  boon  added. 

In  teat  (3),  Z3(l)  haa  tho  noalnal  roqelranaat  to  be  la  the  lowaat  fourth  of  ell 
Z3  frequenclee.  However,  tho  teat  for  Al(i)  la  new  aade  aero  atrieteati  Al(l) 
la  now  required  to  be  In  tho  lowaat  201  of  all  of  lta  poealbla  valuoa.  la 
addition,  wo  retain  tho  requlraaoat  chat  A3(l)  *  Al(l)  aa  la  teet  (2). 

Tho  condition  that  A3(l)  t  Al(l)  la  llluatrated  by  tha  energy  epectra  aa  given 
by  Holna  and  ttovena  (A)  (aee  Figure  1).  Theea  opectra  Indicate  that  tha  above 
teat a  are  reaaonable  for  tho  da tonal nation  of  the  fricative  lj) •  However, 
they  aeon  Inappropriate  for  a  charactarlaatioa  of  /a/  aince  tho  cutoff  for  Z3 
la  5000  Ha,  wharoao  the  opectra  indicate  that  Z3  la  actually  around  3300  -  MOO 
Ha. 
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2.2  VOWEL  DETERMINATION 

If  P(l)  haa  not  boon  labalad  a  fricative,  It  la  labalad  a  wot*  1ft 

(1)  It  la  a  local  aaxlaun  d.a.,  SXT(l)  ■  1) 

(2)  Al(i)  >  It 

(3)  Al(l)  ♦  A .(  ♦  A3(l)  e  23 

and 

(4)  DUR(l)  >  8. 

The  tost  for  a  vowal  aa  glvan  In  tha  program  alao  raqulras  that 
3*DUR(1)  ♦  Al(l)  ♦  A2(l)  ♦  A3(l)  2  30. 

Howavar,  this  condition  la  superfluous  alnca  It  la  automatically  lapllod  by 
condltlona  (3)  and  (4). 

Generally,  a  vowol  la  charactarlzad  In  tha  lltaraturo  aa  a  apaach  segawnt  of 
oufflclant  duration  and  amplitude.  In  tha  praaant  casa,  thlo  la  charactarlaod 
by  condltlona  (2)t  (3),  and  (4).  Howavar,  aa  additional  conatralnt,  via, 
condition  (1),  la  imposed. 

Each  Pd)  found  to  ba  a  vowal  la  a 'signed  a  typa  nuabar  TVPI(l)  aa  followat 

16  If  Zl(l)  <  6  and  Z2(l)  <  18 

7  If  Zl(l)  <  6  and  18  *  Z2(l)  <  27 

8  If  Zl(l)  <  6  and  Z2(l)  t  27 

9  if  6  <  Zl(i)  <  9  and  Z2(l)  <  18 

TTPE(l)  -  <  10  If  6  s  Zl(l)  <  9  and  18  s  Z2(l)  <  27 

I  11  If  6  i  Zl(l)  <  9  and  Z2(l)  a  27 

I  12  If  Zl(l)  >  9  and  Z2(l)  <  18 

I  13  If  Zl(l)  >  9  and  18  i  Z2(l)  <  27 

V  14  If  Zl(l)  >  9  and  Z2(l)  t  27 


*In  searching  for  a  vowal  ovary  SXT(l)  ■  1  la  rasat  to  8XT(1)  •  ••  That  la, 
there  ara  no  indlcatora  of  local  aaxlMaa  loft  froa  thlo  point  on.  For  a 
dotal 7 ed  description  of  tha  meaning  of  a  local  maximum,  saa  (2)  pp.  18-22. 


T 
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Thla  la  ftiluetratid  In  FI  pure  2.  If  S  le  subtracted  from  the  type  nuober  ao 
that  the  range  la  changed  from  6-14  to  1-9,  tha  type  oorraaponda  to  the  vowel 

oobclaaaaa. 


Figure  2.  Vowel  Subclasses  (adapted  from  Vlcena  [1]) 

If  taata  (1),  (2),  and  (3)  are  eatlefled  but  (4)  la  not,  so  that  DUR(l)  s  8, 
than  wo  search  the  surrounding  segments  to  find  the  one  nost  likely  to  be  a 


vowel.  To  perform  this  search,  we  begin  by  defining 
AMPLIM  -  Al(l)  ♦  A2(l)  ♦  A3(l)  - 
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Then  for  J  ■  1-1,  1-2,  ...  ,  we  search  backwards  from  P(l)  until  s  P(J)  Is 
found  for  which  slthsr 

A1(J)  <  16 
or 

Al(j)  +  A2(J)  +  A3(J)  <  max  (25,  AMPLIM} 
or 

TYPE(J)  •  FRIC 
or 

SXT(J)  •  -1  (l.s.,  P(J)  Is  a  local  minimum). 


Us  thsn  1st 

K1  -  J+1  snd  DUR1  -  DUR(j+l). 

A  forward  search  Is  now  mads  for  k  ■  J+1,  j+2,  ...  ,  SIZEP  to  find  a  P(k)  for 
which 

Al(k)  2  16 

and  Al(k)  +  A2(k)  +  A3(k)  >  max  {25,  AMPLIM} 

and  TYPE(k)  +  FRICS 

and  SXT(k)  i  -1  (not  a  local  minimum). 

Thsn  If  | DUR(k)  -  DUR1  |  $  2 

and  Al(k)  +  A2(k)  +  A3(k)  2  A1(K1)  +  A2(K1)  +  A3(K1), 

or  If  |DUR(k)  -  DUR1  |  >  2 

and  DUR(k)  >  DUR1 

then  we  set  DUR1  ■  DUR(k) 

and  K1  ■  k  and  continue  our  forward  search. 
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But  whenever  we  find  a  P(k)  for  which 
Al(k)  <  16 

or  Al(k)  +  A2(k)  +  A3(k)  <  max  {25,  AMPLIM} 
or  TYPE  (k)  -  FRICS 

or  SXT(k)  »  -1  (local  minimum) 

or  k  -  SIZEP  +1, 

then  we  consider  P(K1)  to  be  the  best  choice,  and  If 
5-DUR(Kl)  +  A1(K1)  +  A2(K1)  +  A3(K1)  a  50, 
then  we  let  1  ■  K1  and  label  P(i)  a  vowel,  using  the  numbers  TYPE(l)  as  given 
above. 

The  literature  on  acoustic  phonetics  abounds  with  papers  on  vowel  characterizations. 
Results  from  a  few  representative  papers  have  been  selected  to  help  explain  Vicens' 
vowel  subclasses.  In  particular,  it  is  interesting  to  compare  the  present  vowel 
classifications  with  those  obtained  by  Peterson  and  Barney  [5]  and  Forgie  and 
Forgie  [6]  (see  Figures  3  and  A).  A  glossary  of  the  phonemic  symbols  used  in 
Figures  3  and  A  is  given  in  the  appendix.  A  comparison  of  Figure  2  with 
Figures  3  and  A  indicates  that  the  vowel  classifications  used  by  Vicens  do  not 
correlate  well  with  those  obtained  by  either  Peterson  and  Barney  or  Forgie  and 
Forgie.  First  of  all,  Figure  2  indicates  nine  vowel  categories,  whereas  Figures 
3  and  A  show  ten.  Also,  Vicens  does  not  correlate  his  vowel  categories  with 
particular  vowel  phonemes. 

One  reason  for  the  poor  correlation  is  due  to  hardware  anomalies  in  the 
Vicens-Reddy  system.  Indeed,  zero-crossings  are  not  counted  if  below  the 
threshold  of  .03V.  This  causes  the  Z1  and  Z2  frequencies  to  be  lower  than 
their  actual  values.  These  lower  frequencies  are  reflected  in  the  different 
cut-off  values  for  the  vowel  categories.  In  addition,  the  three  fixed  front- 
end  filters  make  it  difficult  to  obtain  formant  1  and  formant  2  frequencies; 
l.e.,  Z1  and  Z2  can  be  poor  approximations  to  the  actual  formant  1  and  formant  2 
frequencies. 
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Figure  3.  Formant  2  vs.  Formant  1  Vowel  Plot 

(adapted  from  Peterson  and  Barney  [5]) 
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Figure  4.  Formant  2  vs.  Formant  1  Vowel  Plot 
(adapted  from  Forgie  and  Forgie  [6]) 
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2.3  STOP  DETERMINATION 

If  P(l)  has  not  previously  been  labeled  tither  a  fricative  or  a  vowel,  it  is 
labeled  a  stop  (i.e.,  TYPE(i)  -  3)  if 
Al(i)  <  12. 

In  other  words,  Al(i)  is  required  to  be  in  the  lowest  20%  of  all  possible  A1 
amplitudes.  The  choice  of  A1  (rather  than  A2  or  A3)  seems  dictated  by  the  fact 
that  A2  and  A3  are  normalized  with  respect  to  A1  and  could  exceed  the  range 
0-63;  however,  A1  is  always  guaranteed  to  be  in  this  range. 

2.4  NASAL  DETERMINATION 

If  P(l)  has  not  satisfied  the  tests  for  either  a  fricative,  vowel,  or  stop,  it 
is  labeled  a  nasal  (i.e.,  TYPE(i)  ■  2)  if: 

(1)  Al(i)  2  12 

(2)  Zl(i)  s  5 

(3)  3* A2(i)  s  Al(i) 
and 

(4)  3* A3(i)  s  Al(i). 

It  is  Important  to  recall  that  a  vowel  is  distinguished  by  a  local  maximum. 
Thus,  if  P(l)  satisfies  the  amplitude  requirements  for  a  vowel  but  not  the 
duration  requirement  (i.e.,  DUR(l)  <  8),  then  a  search  would  be  made  of 
neighboring  segments  for  one  that  is  more  likely  to  be  a  vowel.  Such  a 
segment,  which  could  satisfy  the  tests  for  both  a  vowel  and  a  nasal,  would 
always  be  labeled  a  vowel.  One  possible  Improvement  to  the  system  could  be 
made  by  performing  the  vowel  and  nasal  tests  concurrently  rather  than 
serially. 


► 
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Nakata  [7]  and  Fujimura  [8]  have  experimentally  derived  characteristic 
properties  o£  nasals.  For  example,  Figure  5  Illustrates  a  spectral  envelope 
for  a  typical  /m/.  Note  that  the  lowest  resonant  frequency  (formant  1)  Is 
In  the  range  200  to  300  Hz,  which  corresponds  to  4-6  in  the  Vlcens-Reddy 
system. 


M 

j 


Figure  5.  Spectral  Envelope  for  a  Typical  /m/ 

(adapted  from  Nakata  [7]) 

Condition  (2)  for  a  nasal  requires  that  Zl(i)  <  5.  Since 
3  <  Zl(i)  ^  5, 

we  have  that  Zl(i)  is  either  3,  4,  or  5,  which  corresponds  closely  to  the  range 
of  4  to  6.  The  remaining  criteria  appear  to  have  been  developed  heuristically 
and  no  further  explanation  will  be  offered. 

It  appears  from  Figure  5  that  the  process  of  nasal  determination  could  be 
improved  by  adding  a  requirement  that  the  highest  resonant  frequency  (formant  3) 
be  around  3000  Hz,  i.e.,  that  Z3  be  approximately  60. 
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2.5  CONSONANT  DETERMINATION 

If  P(l)  has  not  satisfied  the  tests  for  either  a  fricative,  vowel,  stop,  or 
nasal,  then  It  Is  labeled  a  consonant  (l.e.,  TYPE(l)  ■  1)  If  It  satisfies  the 
sole  condition  that  It  Is  a  sustained  segment. 

3.  SECONDARY  CLASSIFICATION 

At  the  completion  of  primary  classification,  each  P-segment  has  been  labeled. 
However,  the  linguistic  label  "burst"  has  not  yet  been  assigned.  Secondary 
classification  begins  by  combining  appropriate  adjacent  fricatives  and  stops. 
Various  fricatives  are  then  Identified  as  "bursts."  Next,  appropriate  tran- 
sltlonals,  consonants,  nasals,  and  stops  are  labeled  "burst."  Finally,  bursts 
adjacent  to  other  bursts  or  fricatives  may  be  combined  on  the  basis  of  tests 
given  below. 

The  P-matrlx  Is  recompacted,  and  a  final  determination  of  the  beginning  and 
ending  segments  Is  performed. 

3.1  COMBINING  OF  ADJACENT  FRICATIVES  AND  STOPS 

Adjacent  fricatives  and  adjacent  stops  are  combined  on  the  basis  of  the 
conditions  Illustrated  In  Table  2,  where  It  assumed  that  the  operations  are 
performed  for  1*3,  ...  ,  SIZEP. 
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Table  2.  Rules  for  Combining  Adjacent  Fricatives  and  Stops 


Case 

Condition 

■ 

2 

3 

■ 

5 

6 

■ 

8 

9 

10 

TYPE(i) 

FRIC 

FRIC 

1 

FRIC 

STOP 

STOP 

STOP 

STOP 

STOP 

TYPE (1-1) 

FRIC 

FRIC 

5  : 

FRIC 

STOP 

STOP 

STOP 

STOP 

STOP 

CL0(i)<-12 

NO 

NO 

NO 

NO 

NO 

|dur(i)-dur(i-i)!s4 

YES 

YES 

YES 

NO 

NO 

YES 

YES 

YES 

NO 

NO 

DUR(i)<DUR(i-l) 

YES 

NO 

YES 

NO 

NAT(l) 

SUST 

SUST 

TRAN 

SUST 

SUST 

TRAN 

NAT (1-1) 

SUST 

TRAN 

SUST 

TRAN 

Actions  To  Be 

Performed 

BH 

m 

■ 

1,4 

■ 

1.2,4 

1.3,4 

1,4 

1.4 

H 

Note:  The  condition  CL0(1)<-12  appeared  In  an  earlier  version  of  the  program 
as  CL0(i)<-4. 

The  following  actions  are  to  be  performed  in  conjunction  with  the  table: 

1.  DUR1  -  DUR(l)  +  DUR(i-l). 

2.  Recompute  the  parameter  values  of  P(l-l)  for  Al,  Zl,  A2,  Z2,  A3,  and  Z3. 
This  calculation  Is  shown  below,  using  Al  as  an  example: 

AlMN(i-l)  -  min  (AlMN(i-l),  AlMN(i) } , 

A1M.il  ■  Al(i-l)  ’DUR(i-l)  +  Al(i)  ,DUR(i)  . 

'  '  DUR(i-l)  +  DUR(i) 

AlMX(i-l)  -  max  (AlMX(i-l),  AlMX(i)}. 

3.  For  columns  2  through  22  of  the  P-matrix,  set 

P(i-l)  -  P(i). 

4.  DUR(i-l)  -  DUR1, 

SXT(i-l)  -  min  (SXT(i-l),  SXT(i)}, 

move  all  the  P-matrix  rows  up  one  row,  and  set 

SIZEP  -  SIZEP  -1. 
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Me  ehall  11  lustra  to  the  uee  of  thla  table  by  considering  the  folloving  examples 
euppoee  that 

TYPE(i)  -  PRIC  and  TYPE(i-l)  -  FllC. 

Ue  then  check  to  eee  If 

|0UR(1)  -  DUR(i-l) |  4  6. 

If  it  le,  we  check  NAT(i)  and  NAT(l-l).  If  both  are  BUST,  actloeo  1,  2,  and  4 
above  are  perforaed. 

3.2  IDENTIFICATION  OF  APPROPRIATE  FRICATIVES  AS  BURSTS 

For  1-2 . SIZEP.  we  label  P(l)  a  burst  (i.e.,  TTPI(l)  -  A)  if  P(l)  haa 

previously  been  labeled  a  fricative  (l.e.,  TTPS(l)  -5),  it  satisfies  the 
condition  that* 

S'DUR(i)  ♦  2*Z3(i)  4  150, 
and  either: 

(1)  DUR(l)  4  6, 
or 

(2)  DUR(l)  2  5  and  A3(i)  4  Al(l), 
or 

(3)  DUR(l)  2  3  and  A3(l)  4  A2(i). 

3.3  IDENTIFICATION  OF  APPROPRIATE  TRAN8ITI0NAL8,  CONSONANTS,  NASALS, 

AND  STOPS  AS  BURSTS 

P(l)  (1-2 . SIZEP)  la  labeled  a  burst  (l.e. ,  TTPI(l)  -  4)  If  TTPI(i)  -  0, 

1,  2,  or  3  (l.e.,  P(l)  la  already  either  a  transitional,  consonant,  nasal,  or 
stop)  and  Z3(l)  2  60  or  Al(l)  4  16. 


*In  an  earlier  version  of  the  program,  this  condition  appeared  as 
5*DUR(i)  ♦  2*Z3(1)  4  140. 
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However,  it  either 

23(1)  <  60  or  Al(l)  >  16, 

and  any  one  of  the  following  a  lx  aeta  of  condltlone  la  satisfied,  ve  alao  label 
Pd)  a  buret*: 

(1)  40  i  23(1)  «  30, 

23(1)  ♦  22(1)  s  60, 

Al(l)  ♦  A2(l)  <  20,  and 
Al(l)  <  6. 

(2)  40  <  23(1)  <  30, 

23(1)  +  22(1)  *  60, 

Al(l)  +  A2(l)  <  20, 

Al(l)  >  6,  and 
A3(l)  »  Al(l). 

(3)  23(1)  >  30, 

Al(l)  ♦  A2(l)  <  20,  and 
Al(l)  *  6. 

(4)  23(1)  >  30, 

Al(l)  ♦  A2(l)  <  20, 

Al(l)  >  6,  and 
A3(l)  *  Al(l). 

(3)  23(1)  *  30. 

23(1)  +  22(1)  >  60, 

Al(l)  ♦  A2(l)  <  20,  and 
Al(l)  <  6. 

(6)  23(1)  *  30, 

23(1)  +  22(1)  >  60, 

Al(l)  +  A2(l)  <  20, 

Al(l)  >  6,  and 
A3(l)  *  Al(l). 


The  condition  4#  s  23(1)  «  30  In  (1)  and  (2)  appeared  In  an  earlier  veralon  of 
the  progran  aa  43  <  23(1)  s  30.  Aleo,  the  condition  Al(l)  <  6  In  (1),  (3),  and 
(3)  vaa  originally  Al(f)  s  19,  and  the  condition  Al(l)  >  6  in  (2),  (4),  and  (6) 
was  originally  Al(l)  >  If. 
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Hallo,  Hughos,  and  Radley  [9]  have  noted  that  atop  burata  nay  bo  characterised 
as  follows: 

/p/  and  /b/  (the  labial  stops)  have  a  high  concentration  of  energy 
around  500  -  1500  Ht; 

/t/  and  /d/  (the  postdsntal  stops)  have  either  a  flat  spectra*  or 
have  high  energy  concentrations  above  4000  Ha  and  around  300  1st 
/k/  and  /g/  (the  palatal  and  velar  stops)  have  high  concentrations 
of  energy  around  1500  -  4000  Ha. 

The  above  data  vara  obtained  fron  an  analysis  of  energy  spectre  of  the  phooenee 
/p/,  /b/,  / t / ,  /d/,  /k/,  end  /g/.  Closer  ex an 1 net Ion  of  these  spectre  revesls 
thet  the  third  fornant  frequency  for  /k/  and  /g/  la  characteristically  between 
3000  Hs  and  4500  Hs,  which  corresponds  to 

60  i  Z3  i  90 

In  the  Vicens-Reddy  systen.  As  stated  above,  a  transitional,  consonant,  nasal, 
or  stop  with  the  property  thet 

23  *  60 

Is  relabeled  e  burst.  In  this  case,  a  reasonably  close  correlation  exists. 
However,  no  correspondence  exists  between  the  regaining  tests  for  e  burst  and 
the  characterizations  given  In  [9]. 

3.4  C^U.IING  BURSTS  ADJACENT  TO  OTHER  BURSTS  OR  FRICATIVES 

The  entire  P-matrlx,  beginning  with  P (2)  la  searched  for  burst  segments.  When 
such  a  segment  has  baan  found,  the  most  adjacent  previous  segment  (which  has 
not  been  previously  combined  Into  another  burst  segment)  la  examined  to 
determine  whether  it  ia  a  burst  or  a  fricative.  If  so,  then  the  burst  segment 
P(l)  Is  combined  with  the  previous  segment  by  adding  DUR(l)  to  tha  duration 
of  the  previous  burst  or  fricative  segment.  If  the  previous  segment  la  a  burst, 
then  Its  new  duration  Is  tested  and,  if  It  la  greater  than  or  equal  to  80  ns«» 
the  TYPE  of  the  segment  Is  changed  from  4  (l.e.,  burst)  to  5  (l.a. ,  fricative). 

Independent  of  whether  or  not  P(l)  was  combined  with  a  previous  segment,  P(l+1) 
is  examined  to  determine  If  It  Is  a  burst  or  a  fricative.  If  It  la,  than  P(l) 
is  combined  with  P(l+1)  by  adding  DUR(l)  to  DUR(i+l)  and  resetting  the  beginning 


> 
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Q-segment  of  P(i+1)  to  point  to  the  beginning  Q-segment  of  P(l).  Again,  If 
P(l+1)  le  a  burst,  the  new  duration  Is  tested  and,  if  it  is  greater  than  or  equal 
to  80  as.,  TYPE (1+1)  Is  changed  from  4  to  5. 

A  possible  result  of  this  combining  procedure  is  that  a  burst  segment  P(l)  could 
be  combined  with  a  fricative  or  burst  preceding  it  and  also  with  a  fricative  or 
burst  following  It.  The  resulting  duration  of  the  two  segments  would  then  be 
erroneous.  A  more  detelled  description  of  this  procedure  can  be  found  In 
Figure  6. 


3.S  DETERMINATION  OF  BEGINNING  AND  ENDING  SEGMENTS 

The  P-matrlx  Is  reconnected  by  suppressing  all  segments  P(l)  for  which 
TTPE(l)  ■  >1  (recall  that  all  segments  so  flagged  were  previously  combined  with 
adjacent  segments).  Let  k  denote  the  row  number  of  the  last  row  of  the 
reconnected  P-matrlx . 

Beginning  with  P(k),  the  P-matrlx  is  examined  backwards  from  1  ■  k  to  1  ■  2  as 
follows:  If  either: 

(1)  P(l)  Is  a  stop 
or 

(2)  P(i)  is  not  a  stop,  burst,  or  fricative  but 

4*DUR(i)  +  2*Al(i)  <  36, 

then  P(l-l)  Is  examined  similarly  until  we  find  a  P(l)  which  satisfies 
neither  (1)  nor  (2).  Such  a  P-segment  is  either: 

(1)  a  burst  or 

(2)  a  fricative  or 

(3)  not  a  stop,  burst,  or  fricative  but 
4*DUR(i)  +  2*Al(i)  >  36. 

If  P(l)  Is  a  burst,  then  we  set 

SIZEP  -  1  and  DUR(i)  ■  max  {6,  DUR(i) }. 

This  implies  that  the  ending  segment  Is  a  burst  of  duration  not  less  than  60  ms. 
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If  P (1)  is  a  fricative,  then  we  set 

SIZEP  -  1  and  DUR(l)  -  min  {12,  DUR(i) } . 

This  implies  that  the  ending  segment  is  a  fricative  of  duration  less  than  or 
equal  to  120  ms.  If  DUR(i)  is  now  <  10 
or 

5*DUR(i)  +  Z3(i)  S  110 
or 

Al(i)  +  A2(i)  <  8, 
then  we  set 

TYPE(i)  -  4, 

i.e.,  the  fricative  |*(i)  is  relabeled  a  burst. 

If  P(i)  is  not  a  stop,  burst,  or  fricative  but 
4*DUR(i)  +  2* Al(i)  >  36, 
then  we  set 

SIZEP  -  min  {k,  i+1}. 

This  means  that  if  i  *  k,  the  speech  sample  ends  with  a  consonant,  nasal,  vowel, 
or  transitional.  However  if  i  <  k,  the  sample  ends  with  the  segment  following 
P(i).  This  may  be  a  vowel,  nasal,  or  consonant  which  did  not  pass  the  test, 
or  a  stop. 

To  determine  the  beginning  segment  of  the  P-matrix,  if  P(2)  is  a  stop  and 
DUR(2)  is  greater  than  5,  then  the  beginning  Q-segment  of  P(2)  is  defined  to 
be  SBG(2)  -  SBG(2)  +  DUR(2)  -5  and  we  set  DUR(2)  -  5. 

A  more  detailed  description  of  the  handling  of  beginning  and  ending  segments 
can  be  found  in  Figure  7 . 
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Figure  6.  Flow  Chart  for  Combining  Bursts  Adjacent  to  Other  Bursts  or  Fricatives 
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Figure  7.  Flow  Chart  for  Determination  of  Beginning  and  Ending  Segments 
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4.  CONSTRUCTION  OF  THE  R-MATRIX 

The  results  of  primary  and  secondary  classification  will  now  be  used  to  construct 
an  array  called  the  R-matrix,  or  feature  matrix.  This  matrix  will  be  used  In  the 
lexicon  lookup  portion  of  the  program  to  identify  a  spoken  message. 

The  R-matrix  consists  of  10  columns  and  a  maximum  of  40  rows.  Let  R  ■  (r^), 

where  1-1,  ...  ,  m  and  j  ■  1,  ...  ,  10,  where  m  <  40.  The  first  row  of  R  is 

defined  as  follows: 

r.  .  ■  number  of  vowels  in  the  message, 

x,  i 

r2_  2  "  num^er  fricatives  in  the  message, 

r^  j  -  an  unused  position  of  the  array, 

rl,4  - 

r^  ^  "  row  number  of  first*  vowel  appearing  in  message, 

rl  6  "  row  number  of  second  vowel  appearing  in  message, 

r.  _  •  row  number  of  third  vowel  appearing  in  message, 

1  »  / 

r^  g  »  row  number  of  fourth  vowel  appearing  in  message, 

r .  g  ■  row  number  of  fifth  vowel  appearing  in  message, 

r^  »  an  octal  pattern  representing  the  sequence  of  vowels  and 

fricatives  in  the  message;  an  octal  "1"  represents  a  vowel, 
and  an  octal  "2"  represents  a  fricative. 


it 

If  the  message  contains  only  one  vowel,  r,  ,  ■  r,  _ 

1,0  1,7 


1.8 


‘1.9 


0. 
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The  remaining  rows  of  the  R-matrlx  are  defined  as  follows  for  1-2  i  s  •  s  |  IQ' 
rA  ^  -  alphanumeric  phonemic  label  of  P(l)  (see  Table  1  for  the  four- 
character  phonemic  labels) , 

r.  «  DUR(i),  the  length  of  P(i)  In  minimal  segments, 

1  *  * 

r±  3  -  Al(l), 

rM.ZlU). 

"1,5  ■  A2<1>- 

r£  7  -  A3(l) , 

ri,8  "  Z3(1)* 
r19  -  SXT(l). 


t 
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APPENDIX 

Vowel  Phonemes  as  Adapted  from  Reddy  [10] 


PHONBMB  AS  IN 

i  eve 

I  it 

£  met 

8E  at 

^  bird 

A  “P 

a  father 

0  all 

U  foot 

0  boot 


Note:  e  as  In  "mate"  and  o  as  In  "obey"  are  not 
Included  because  they  are  considered  to  be 
diphthongs . 
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